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1. INTRODUCTION 

With recent developments in lattice QCD, a 
wide range of high-precision calculations is now 
possible. To achieve high precision we need tight 
control of all the systematic errors inherent in the 
analysis of simulation data. A particularly irri- 
tating source of such systematic errors has been 
curve fitting, which plays a central role in almost 
all applications of lattice simulations. 

Lattice theorists use curve fitting in many dif- 
ferent ways: we use it, for example, to extract 
hadronic masses and matrix elements from Monte 
Carlo estimates of correlators, to extrapolate 
light-quark masses to physical values, and to fit 
short-distance Monte Carlo results to perturba- 
tive expansions. In each such case the theory we 
fit to Monte Carlo data has an infinite number of 
parameters. In order to fit such a theory using a 
finite amount of data, we normally truncate both 
the data set and the theory. Thus we might re- 
tain only data for the smallest quark masses, and 
extrapolate linearly to physical masses. Or we 
might fit only the large-t behavior of a hadronic 
correlator, ignoring contributions from all but the 
lowest-mass hadron. Such truncations are usu- 
ally necessary to obtain good fits with reasonable 
error estimates for the fit parameters, but they 



necessarily increase both the statistical and sys- 
tematic uncertainties in the results. Statistical 
uncertainties are increased because Monte Carlo 
data is discarded. Systematic uncertainties are 
introduced by omitting parts of the theory that 
could conceivably be significant. Such systematic 
effects have proven particularly difficult to esti- 
mate in past analyses, and these are the principle 
focus of this article. 

In this paper we show how to circumvent the 
shortcomings of the traditional approach by using 
constrained curve fitting, a simple modification 
of standard maximum likelihood techniques [Q . 
Constrained curve fits, while widely used in other 
research areas, are not used much by the lattice 
QCD community. Here we will show how they can 
be adapted for use in lattice calculations, and we 
will document their striking advantages over tra- 
ditional methods. Constrained curve fits provide 
an elegant procedure for incorporating systematic 
uncertainties due to underconstrained parts of a 
theory (high-energy states, for example). Fur- 
thermore they allow us to fit much more data — 
for example, correlators down to t = 0. And they 
are numerically more robust than unconstrained 
fits. Finally the formalism for constrained curve 
fitting can be used to estimate the lattice spacings 
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and quark masses that will minimize the errors in 
a large-scale simulation. 

2. CONSTRAINED FITS 

2.1. The Problem 

The central problem is illustrated by the analy- 
sis of a meson correlator. Simulations generate a 
Monte Carlo estimate, G(t), of the correlator for 
a finite number of time steps, say t = 0, 1 . . . 23. 
Theory tells us that the exact correlator has the 
form 



Gth(t; A n , E n ) 



E 

n=l 



A n e 



-E„t 



(1) 



where we assume that the energies E n are in or- 
der of increasing size. The challenge is to fit an 
infinite number of amplitudes A n and energies E n 
using only 24 G(t)'s. 

Traditional fits minimize x 2 (A„, E n ) by vary- 



ing A n and E n , where 
x 2 (A n ,£„)^AG(t) a M 2 ,AG(t'), 



/./' 



and 



AG(t) = G(t)-G th (t;A n ,E n ). 



(2) 



(3) 



The correlation matrix is estimated from the 
Monte Carlo: 
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G(t)G(t>) - G(t) G(t'). 



(4) 



Unfortunately this fitting procedure is singular 
here since there are more fit parameters, A n and 
E n , than data; the final uncertainties in the fit 
parameters are infinite. Additional information 
is needed if we are to proceed. 

The information we normally add is that the 
An's are well behaved, and therefore contribu- 
tions from high-energy states are suppressed at 
large t by the exponentials in the correlator, 
Eq. (||). Thus there is a i m i n above which only 
the first one or two terms in Gth make statisti- 
cally significant contributions. The standard pro- 
cedure therefore is to retain, say, only the first two 
terms in Gth and to fit them using only Monte 
Carlo results from t > t m i n . The trick is to find 
the best t m [ n . Choosing t m i n too small biases the 
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Figure 1. Fit values for the lowest two energies 
from a 2-term fit to a local-local T correlator 
using different t m i n 's. The correct values, from 
other analyses, are indicated by the dotted lines. 



E n 's and A„'s away from their true values, in- 
troducing systematic errors (because two terms 
is not enough in Gth)- Choosing t mlu too large 
gives statistical errors <JE n and a A n that are too 
large, since useful data is discarded. One typi- 
cally tries to increase £ m j n until the statistical er- 
rors mask any possible systematic error. Without 
a reliable quantitative estimate of the systematic 
error, however, any procedure for setting t nlln is 
necessarily ad hoc. 

To illustrate the dependence on t m i n , we plot 
results for E\ and E% from 2-term fits with var- 
ious £ m in's in Fig. |l]. The Monte Carlo data for 
these fits was obtained by averaging 840 T corre- 
lators evaluated at (quenched) /3 = 6 with local 
sources and sinks Q . The competition between 
large systematic errors for small i m i n and large 
statistical errors for large i m i n is particularly ap- 
parent for E% in this plot. 

2.2. A Solution 

Our goal should be to fit all the Monte Carlo 
data (t m in = 0) using as many terms as we wish 
in Gth- As we add more terms to Gth, how- 
ever, the errors on the leading parameters grow 
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Figure 2. Fit values for the two lowest energies 
from unconstrained fits with different numbers of 
terms in Gth- The correlator is a local-local T 
correlator and is fit for all t's. 



Figure 3. Fit values for the two lowest energies 
from constrained fits with different numbers of 
terms in Gth- The correlator is a local-local T 
correlator and is fit for all t's. 



steadily in a traditional analysis, as is evident in 
Fig. H The reason is easily understood. The 
large uncertainties in E\ and E 2 for the 8-term 
fit, for example, result because the parameters 
for higher-energy states are poorly constrained by 
the data and therefore wander off to unphysical 
values. Thus amplitude A4 ranges between five 
and ten times A\ in the 8-term fit, while quark 
models suggest that A4 is of order Ai or smaller. 
Since the allowed range for A4 affects the error 
estimates for other parameters, the errors on E\ 
and E 2 will be unreasonably large so long as the 
fitting code assumes that A4 « 10Ai is plausible. 
We need some way to teach physics to the fitting 
code. 

To constrain fit parameters to physically rea- 
sonable ranges, we augment the x 2 before mini- 
mizing: 

X * Xaug — X ~t~ Xprior' (^) 

where 

2 _ ( An ~ A n) 2 ST ( En ~ En ) 2 (a\ 

Aprior — ^ ~2 ^2 ' V u ) 

n A n n E n 

The extra terms in Xaue favor A n 's in the inter- 



val A n ± 17^4^ and E n 's in E n ± &E n . The A n 's, 
a A n 's . . . are inputs to the fitting procedure. We 
choose reasonable values for them on the basis of 
prior knowledge. This set of input parameters is 
referred to collectively as the "priors." 

Having chosen the priors, the procedure for a 
constrained fit is to minimize Xaug fitting all of 
the Monte Carlo data (t m [ n — 0). The number of 
terms in Gth is increased until fit results converge 
for the parameters of interest. Unlike t m i n , the 
number of terms in G t h need not be optimized; it 
is simply increased until results converge. This is 
illustrated by fit results for E\ and E 2 from our 
T data, which are plotted in Fig. || for fits with 
different numbers of terms. 

The numerics are greatly improved by the con- 
straints. For example, one can easily fit 100 terms 
in Gth to the T data, even though there are only 
24 data points. The fit results for all but the first 
few parameters simply reproduce the prior infor- 
mation in such a highly overparameterized fit. 

The error estimates for the fit parameters in 
our T fits automatically combine both the sta- 
tistical errors in the Monte Carlo data, and the 
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systematic errors due to our limited knowledge, 
as specified by the priors, about the poorly con- 
strained high-energy states. To see how much er- 
ror is due to each source we refit with all cr's dou- 
bled. Results for the energies change as follows: 



E x = 0.4526(15) 
E 2 = 0.683(49) 
E 3 = 1.05(12) 



0.4528(14) 

0.697(50) 

1.10(21). 



(7) 



Evidently E\ and E 2 are determined largely by 
the Monte Carlo data, while E 3 is strongly af- 
fected by the priors. The insensitivity of the lead- 
ing parameters to the priors is typical for high- 
quality data. The result for E 2 is impressive given 
that it comes from a single local-local correlator. 
(It also agrees with results obtained from multi- 
source/sink fits 0.) 

We actually parameterized our T fits in terms 
of parameters a n = \ogA n and e„ = log(_E„ — 
£7 n _i). This parameterization builds in a pri- 
ori requirements, A n > and E n > E n _\, which 
improve the fits. Using previous simulations 
as a guide, we chose priors that favored a n s» 
log 0.02 ± log 2 and e n » log 0.2 ± log 2 or: 



A n « 0.02 



-0.02 
-0.01 



E n — E n _i « 0.2 



-0.2 
-0.1 



(8) 



Our best 5-term fit to the T data is shown in 
Fig. ^, together with the data. The fit is excellent 
all the way down to t = 0: the minimum xiug di- 
vided by the number of data points is 0.8. (This 
is the correct ratio to examine since Xaug nas one 
extra term beyond those in x 2 for each fit pa- 
rameter.) While x 2 : Qi etc. play a crucial role 
in traditional fits, where t m i n must be optimized, 
they are of secondary importance in constrained 
curve fitting. The key criterion is convergence as 
the number of parameters is increased. If Xaug 
per data point is significantly larger than 1 af- 
ter convergence, then there is likely a mistake in 
either the data or the theory. 

We have experimented with constrained fits for 
a wide variety of other correlators, including fits 
of 30-40 parameters for 4 x 3 matrix G's, simul- 
taneous fits of multiple channels (e.g., ir and p), 
static potentials and glueball masses, and corre- 
lators involving staggered quarks. All work well. 
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Figure 4. The constrained 5-term fit to the local- 
local T correlator. The statistical errors in the 
data points are too small to be resolved in the 
plot. 



In some cases fits are greatly simplified. An ex- 
ample is the fit shown in Fig. for a B meson cor- 
relator made from an NRQCD propagator for the 
b quark and a staggered quark propagator for the 
d quark || . Such fits are complicated by contribu- 
tions from opposite parity states, introduced by 
the staggered quarks, that oscillate with an over- 
all factor (—1)*. Traditional fits have difficulty 
quantifying the importance of these contributions 
since they are small at large t. Constrained fitting 
down to t — 0, however, makes it easy to account 
and correct for them. 

3. THEORY AND INTERPRETATION 

3.1. Bayesian Statistics 

Bayesian statistics provides a useful frame- 
work for understanding the assumptions that go 
into constrained curve fitting || . In this ap- 
proach the analysis is recast in terms of proba- 
bilities: P(A\B) denotes the probability that A 
is true or correct assuming B is true. For com- 
pactness we denote the set of all parameters by 
p = {Ai, Ei . . .}, the set of prior parameters by 
rj = {Ai, &A\ ■ ■ •}) and the Monte Carlo data 
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Figure 5. A constrained fit to a local- local B me- 
son correlator made with NRQCD and staggered 
quark propagators. The energies of the three 
lowest-energy states are listed in lattice units. 
The statistical errors in the data points are too 
small to be resolved in the plot. 



byG. 

We begin with two assumptions. The first is 
that the Monte Carlo data set is sufficiently large 
that G has Gaussian statistics (Central Limit 
Theorem). Then the probability density for ob- 
taining a particular G given a particular theory, 
specified by p, is 

P(G\p) oc e - x2( " )/2 (9) 

where % 2 is defined as in Eq. (^) .[] 

What we need ultimately is the probability that 
a particular set of parameters p is correct given 
the Monte Carlo data — that is, we need P(p\G), 
not P(G\p). P(p\G) is connected to P(G\p) by 
Bayes Theorem:]] 



P( P \G) 



P(G\p)_P(p) 
P(G) 



<xP{G\p)P{p). 



(10) 



1 This is not strictly correct since the correlation matrix 
or , in x 2 will i n general depend upon p, while in practice 
we use a fixed Monte Carlo estimate of it. 
2 This formula follows from the trivial identity for proba- 
bilities: P(pG) = P(G\p) P{p) = P(p\G)P(G). 



Here P(G) is the probability of obtaining a par- 
ticular G from any theory; it is p independent. 
More important is P(p) which is the probability 
that a particular set of parameters p is correct in 
the absence of any new data. It contains what 
we know about the parameters before we begin 
the fit. This is called the "prior" distribution, or 
simply the prior. 

Our second assumption is that the prior distri- 
bution can be approximated by the Gaussian 

P(p) = e">iio r (P)/2 (11) 

where Xprior ^ s defined as in Eq. (^|) . The a priori 
assumption therefore is that pi ~ p~i ± a Pi . With 
this assumption our final probability function is 

(p)/2 



oc e 



is the augmented \ 



(12) 

2 introduced in 



P{p\G) 

wh ere xj 
Section |2.2| . 

The choice of a Gaussian for the prior distribu- 
tion is arbitrary; other choices might well be ap- 
propriate. There is, however, an argument that 
suggests Gaussians. Beyond specifying an aver- 
age value for each parameter and a standard de- 
viation, we want the prior distribution to be as 
unbiased as possible. In general the least biased 
choice for a probability density is the one that 
minimizes the information content, or, equiva- 
lently, maximizes the entropy, 



s 



P(p) log P(p) dp. 



(13) 



Given the constraints (p) = p and (p 2 ) — (p) 2 = 
a 2 , a simple variational calculation shows that 
the entropy is maximized by the Gaussian 

P{p) = —L^ c -(p-p) 2 /^ 2 . (14) 



27TO- 

This argument is less compelling than it seems, 
however, because it relies upon the implicit (and 
arbitrary) assumption that all p's are equally 
likely in the absence of any information about the 
mean or standard deviation. A different parame- 
terization implies a different assumption. 

3.2. Error Estimation 

Given P(p\G) we can compute everything we 
want to know using integrals; in principle no min- 
imization is needed. For example, we obtain a 
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statistical estimate of an arbitrary function of the 
parameters using 



(f(p))=B- 1 e-^^/\f(p)dy 



where 



B = 



-Xau g <»/2 d n p, 



(15) 



(16) 



and the variance is aj = {f 2 ) — (/) 2 , as usual. 
In practice these integrals are quite difficult to 
evaluate for all but the simplest of fits. This is 
because P(p\G) is typically very sharply peaked 
about its maximum. For smaller problems, adap- 
tive Monte Carlo integrators, such as vegas, are 
effective. For larger problems Monte Carlo sim- 
ulation techniques, such as the Metropolis or hy- 
brid Monte Carlo methods, can be effective. Still 
the cost of evaluating the integrals is often pro- 
hibitive, particularly when there are lots of poorly 
constrained parameters (which lead to long, nar- 
row, high ridges in the probability distribution). 
Consequently efficient approximations are useful. 

One approximation assumes that enough 
statistics have been accumulated so that Xaug(p) 
is approximately quadratic in the p's throughout 
the dominant integration region: 



where parameters p = p* minimize Xaug- ^ n this 
quadratic limit we can approximate 



(f(p)) « /GO, 



E^woo^/ooas) 



provided f(p) is sufficiently smooth as well. This 
is the constrained curve fitting procedure outlined 
in Section f2~2[ 

A different, more robust approximation to the 
Bayesian integrals uses a bootstrap analysis, suit- 
ably modified to account for the prior distribu- 
tion H . A bootstrap analysis generates an en- 
semble of p's whose distribution approximates the 
Bayesian distribution P(p\G). Each p in this en- 
semble is obtained by minimizing Xaug, but with 
different Monte Carlo data and different means 
for the priors for each p. We replace G in Xaug 
by the average of a random selection of G's, al- 
lowing duplicates, from the original Monte Carlo 



data ensemble, just as in the standard bootstrap 
method. The new means, p^, for the priors are 
random numbers drawn from a Gaussian distri- 
bution with mean pi and standard deviation a Pi . 
The new means incorporate the effects of the pri- 
ors into the bootstrap p distribution. Given 100 
or 1000 p's generated in this fashion, we then av- 
erage over the ensemble to compute estimates of 
any function f(p). Although this procedure en- 
tails a minimization for each p in the ensemble, 
we find that is generally much faster than Monte 
Carlo evaluation of Bayes integrals (Eq. (fi"5"|)). 

The distribution obtained from this modified 
bootstrap algorithm is not precisely the Bayes 
distribution P(p\G). It has additional factors 
such as yjdetgij where 

_ 2 dG(t;p) dG{t'-p) 



dpi 



dp. 



(19) 



is a metric induced on p space |g| . These factors 
become constants for sufficiently high statistics 
and so make no difference in that limit. This 
particular factor is interesting, however, because 
it makes the measure in p space invariant under 
reparameterizations. This suggests that 



P'(p\G) cx y/det gij e' 



J 2 



(20) 



might be a better choice for our Bayesian prob- 
ability. Then only Xprior would depend upon the 
parameterization of G trl , and Xprior usually has 
little influence on the best-determined fit param- 
eters. 



4. OTHER APPLICATIONS 

One approach to evaluating high-order QCD 
perturbation theory is to simulate the quantity 
of interest at high /3's, where QCD is perturba- 
tive, and fit perturbative expansions to the re- 
sults. An example, from |^|, is the 5x5 Wil- 
son loop, W(5, 5), which was measured in Monte 
Carlo simulations with 9 different lattice spac- 
ings that gave couplings op(3.4/a) covering the 
range 0.01-0.07. The measured values of W(5, 5) 
were fit using an expansion 



A I 



-^jlogW(5,5) = £\ 



(21) 
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Cl 


C2 


C3 


fit: 


1.7693(6) 


-1.201(40) 


4.3(7) 


exact: 


1.7690 


-1.177 






c 4 






fit: 


0.7(4.9) 


0.1(5.0) 




exact: 









Table 1 

Perturbative coefficients for W(b, 5) as deter- 
mined from fits to high-/3 simulations, and exactly 
from Feynman integrals. Refitting with c\ and C2 
held at their exact values gives C3 = 3.91(21). 



where q* — 2.23 /a and M > 5. Priors c n = and 
<5c„ = 5 were used in the fits to obtain the results 
in Table [l] (for M = 5) . The agreement with exact 
results is excellent. Refitting with c\ and C2 set 
equal to their exact values gives C3 = 3.91(21), 
which agrees well with the result in Table but 
which is much more precise. 

The errors for the c„'s again include both sta- 
tistical errors, reflecting the precision of the data, 
and systematic errors, reflecting the degree of our 
ignorance of higher-order coefficients. The low- 
order coefficients (n < 3) are largely determined 
by the Monte Carlo data; their values and errors 
are essentially unchanged if the a Cn 's are doubled, 
or if M is doubled. The high-order coefficients 
are controlled mostly by the priors. Were we to 
increase the statistics, more c„'s would be deter- 
mined by the data. Thus the effective order of 
the fit increases automatically as warranted by 
the statistics. This is one of the most attractive 
aspects of constrained fitting: We don't have to 
agonize about whether a fit should be linear or 
quadratic or. . . ; instead we set M = 10 and allow 
the data to tell us how much it can determine. 

Similar issues arise in extrapolations, for exam- 
ple, to the chiral mass limit or to zero lattice spac- 
ing. By using constrained fits, we can escape the 
limitations of linear or quadratic extrapolations. 
Instead we can include many orders, with con- 
strained coefficients, and allow the Monte Carlo 
data to determine the relevant order automati- 
cally. (See H for an illustration of an extrapola- 
tion in lattice size L.) 



5. VARIATIONS 

5.1. Optimizing Simulations 

In our numerical fits of perturbation theory to 
simulation data (Section 0), the final errors for 
the various perturbative coefficients, c„, are af- 
fected by the op's at which we chose to simu- 
late. Our fitting formalism can be used to de- 
termine the choice of ap's that minimizes these 
errors, thereby optimizing the simulation . To 
illustrate how this is done, we consider a simpler 
problem. We measure a quantity w whose per- 
turbative expansion is 



W = 1 + C\Ctp + C20Lp 



(22) 



For simplicity we ignore all terms beyond second 
order, and we limit ourselves to a single measure- 
ment of w. The challenge is to determine which 
value of ctp (and therefore which lattice spacing) 
will lead to the smallest errors when we determine 
ci from a Monte Carlo measurement, w ± a w , 
of w. This error can be determined from the sec- 
ond derivatives of 



Xaua 



(1 



ciap + C2a>p — w) 2 



4 



4 



(23) 



using Eqns. ( |l7| ) and (|l8|). The last term in this 
equation is our prior; we expect c n sw ± a c . We 
obtain the following formula for the fit error on 

{apol + al )a 2 c 



" (a% + a p )d 2 c + al 



(24) 



where we will assume that a w is independent 
of ap (which is not true for Wilson loops, for 
example, unless numerical roundoff errors domi- 
nate). Note that a Cl — ► a c in cither of the limits 
ap — * or ap — > 00; we would learn nothing 
new from a simulation in either limit. The error 
is minimum at 



ol = a* = \J a w /a c , (25) 
where do~ Cl /da vanishes. At this value 

(26) 



~2 . ~2 
a ci c ~ 



2cr„ 



Thus a = a* is the preferred value for a simula- 
tion; the optimal value depends upon the amount 
of computing that is available (to reduce o~ w ). 
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The ci error decreases like the simula- 

tion error, a w , vanishes. This is rather slow; we 
would prefer an error that vanishes like a w . At 
some small value of a w it will be better to divide 
the available computer resources between simula- 
tions at two values of ap, rather than just one. 
The analysis above can be repeated for this case, 
to determine the two optimal ap's, and the re- 
sulting cr Cl can be compared with that above, to 
determine when to switch from one ap to two. 

Our simple analysis illustrates a generic design 
step that should precede any large-scale simula- 
tion in lattice QCD. By examining Xaug' s f° r ml_ 
portant quantities, we can estimate the optimal 
lattice spacings and quark masses that should be 
used in the simulation before we simulate. Fur- 
thermore we can estimate the optimal fraction of 
the total computing resource to spend on each 
case. Simulation costs should be significantly re- 
duced by such optimizations. 

5.2. Tuning Priors: Empirical Bayes 

The denominator in the Bayes expression, 
Eq. ([To]) , is the probability of obtaining the Monte 
Carlo average G given the prior information: 

P(G) = P(G\ V ) cx / ^f^- d n p, (27) 
which, in the Gaussian limit (Eq. (|17j)), becomes 



x/2 



(28) 



. — , . -v/det Cij 2 
P(G n) oc ^= £ e-x-s.' 

This probability will be small if the priors are in- 
consistent with the data. This suggests that one 
might tune the prior parameters r\i to maximize 
probability P(G\r}), thereby determining the pri- 
ors from the data. This is referred to as the "em- 
pirical Bayes" method. It can be useful if only 
one or two prior parameters is being optimized. 
For example, when fitting the perturbation series 
in Eq. (pi]), one might vary all the a Cn 's together 
to find the a c that maximizes P(G t h|??)- (Note 



that the Gaussian approximation, Eq. (28), is ex- 
act in this case). Then one would fit with all 
a Cn — a c . Note, however, that separately opti- 
mizing all t^'s leads to complete nonsense.^ 

3 The "best" fit is then the p that minimizes x 2 (not Xaug); 
and the fit errors are zero! 



5.3. Nonparameteric Fits and Maximum 
Entropy 

Bayesian methods have previously been used 
for lattice data, and indeed for fitting correla- 
tors |?]] . The priors and parameterizations used 
in these earlier papers, however, are quite differ- 
ent from what we have discussed above. These 
papers express the correlator in terms of its spec- 
tral function, p{to): 



G(t) 



due'"* p{u). 



(29) 



The to axis is divided into a large number of in- 
crements, say 750 or 1000, with centers ujj, and 
the integral is approximated by a sum. The fit 
parameters are the values of p(ujj) for all j. This 
is an example of a "nonparametric" fit. Peaks in 
p(uj) signal bound states. Since the number of 
fit parameters is typically (much) larger than the 
number data points, a prior distribution is essen- 
tial. 

The Maximum Entropy principle provides a 
framework for constructing priors for this pur- 
pose. The spectral function shares several prop- 
erties with probability densities: it has random 
noise, p{uj) > 0, and p(tu)duj is physically mean- 
ingful. This suggests that we think of the spectral 
function as a probability density. The most prob- 
able density is the one with the largest entropy, 
which suggests that we choose a monotonic func- 
tion of the entropy as a prior: for example, 



P{p) oc e a s{p) 



(30) 



is maximum when the entropy S(p) is maximum. 
Using a prior of this form we can fit all of the 
/o(wj)'s to the Monte Carlo data. (Parameter 
a can be optimized using the empirical Bayes 
method; see the previous section.) 

This algorithm has been used to fit the same T 
data that we used for our constrained fits § . The 
errors that result for the first excited state (£2) 
are 2-3 times larger than we obtained from our 
constrained fits in Section 2.2. This is not surpris- 
ing. The maximum entropy prior contains far less 
information than we included in the priors for the 
constrained fits, and that is its weakness in this 
context. One should use all the prior information 
one has when fitting data. The maximum entropy 
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prior is useful when there is little initial informa- 
tion and no simple parameterization of the theory, 
as is the case in image processing, for example. 



6. CONCLUSION 

Constrained curve fitting is a major improve- 
ment over the traditional fitting procedures used 
in lattice QCD analyses. Neither the data nor the 
theory is truncated. Consequently more informa- 
tion, about excited states or higher-order terms, 
is extracted from the same Monte Carlo simu- 
lations. Most importantly these techniques pro- 
vide a transparent and controlled procedure for 
including systematic errors due to the infinitely 
many parameters in the theory that are not con- 
strained by the Monte Carlo data. Constrained 
curve fitting is useful for any kind of analysis that 
involves correlators. It is also useful for extrapo- 
lating to small quark masses or lattice spacings; 
the order of extrapolation increases automatically 
to match the statistical precision of the data. Fi- 
nally we can use the formalism to estimate the 
quark masses and lattice spacings that optimize 
a large-scale simulation given a finite computing 
resource. 

Priors make some people uneasy because the 
constraints are added information, beyond the 
Monte Carlo data, that can bias the fit results. 
Such people might argue that one should "let the 
data speak for itself." However, it is precisely our 
ability to fold extra information into the fit that 
makes constrained curve fitting so much more 
powerful and useful than unconstrained fitting. 
Better a priori knowledge, for example from ear- 
lier analyses, results in smaller cr's and smaller 
systematic errors. No a priori knowledge, which 
corresponds to the unconstrained case, (correctly) 
results in infinite systematic errors. Traditional 
fitting is a special case of constrained fitting, with 
infinitely broad priors for the first few parame- 
ters and infinitely narrow priors, centered around 
zero, for the others. The priors we use in our 
constrained fits are far more reasonable. 
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